智能论文笔记

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus

Josh Meyer , David Ifeoluwa Adelani , Edresson Casanova , Alp Öktem , Daniel Whitenack Julian Weber , Salomon Kabongo , Elizabeth Salesky , Iroro Orife , Colin Leong , Perez Ogayo

分类：自然语言处理

2022-07-07

Bibletts是一种在撒哈拉以南非洲使用的十种语言的大型，高质量的开放语音数据集。该语料库包含每语言最多86个小时的对齐，工作室质量的48kHz单扬声器唱片，从而能够开发高质量的文本到语音模型。代表的十种语言是：Akuapem Twi，Asante Twi，Chichewa，Ewe，Hausa，Kikuyu，Lingala，Luganda，Luganda，Luo和Yoruba。该语料库是由Biblica的Open.Bible Project制作和发行的圣经录音的衍生作品。我们已经对齐，清洁和过滤了原始录音，并还对每种语言的对齐子进行了手工检查。我们为具有Coqui TTS的文本到语音模型提供了结果。数据是根据商业友好的CC-SA许可发布的。

translated by 谷歌翻译

BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing

Jason Alan Fries , Leon Weber , Natasha Seelam , Gabriel Altay , Debajyoti Datta , Samuele Garda , Myungsun Kang , Ruisi Su , Wojciech Kusa , Samuel Cahyawijaya

分类：自然语言处理

2022-06-30

培训和评估语言模型越来越多地要求构建元数据 - 多样化的策划数据收集，并具有清晰的出处。自然语言提示最近通过将现有的，有监督的数据集转换为多种新颖的预处理任务，突出了元数据策划的好处，从而改善了零击的概括。尽管将这些以数据为中心的方法转化为生物医学语言建模的通用域文本成功，但由于标记的生物医学数据集在流行的数据中心中的代表性大大不足，因此仍然具有挑战性。为了应对这一挑战，我们介绍了BigBio一个由126个以上的生物医学NLP数据集的社区库，目前涵盖12个任务类别和10多种语言。 BigBio通过对数据集及其元数据进行程序化访问来促进可再现的元数据策划，并与当前的平台兼容，以及时工程和端到端的几个/零射击语言模型评估。我们讨论了我们的任务架构协调，数据审核，贡献指南的过程，并概述了两个说明性用例：生物医学提示和大规模，多任务学习的零射门评估。 BigBio是一项持续的社区努力，可在https://github.com/bigscience-workshop/biomedical上获得。

translated by 谷歌翻译

The Scattering Transform Network with Generalized Morse Wavelets and Its Application to Music Genre Classification

Wai Ho Chak , Naoki Saito , David Weber

分类：机器学习

2022-06-16

我们建议在散射转换网络（STN）中使用广义的摩尔斯小波（GMW），而不是常用的莫雷特（或Gabor）小波，我们称之为GMW-STN，用于信号分类问题。GMWS形成了真正分析波的参数化家族，而Morlet小波仅近似分析。STN中潜在小波过滤器的分析性对于非组织振荡信号（例如音乐信号）尤为重要，因为它通过提供多尺度振幅和相位（以及导致输入信号的频率）信息来提高STN表示的可解释性。我们使用所谓的GTZAN数据库证明了GMW-STN比传统STN的优越性。此外，我们通过将其层数增加到典型的两层STN的三层，以显示GMW-STN的性能提高。}

translated by 谷歌翻译

Towards modelling hazard factors in unstructured data spaces using gradient-based latent interpolation

Tobias Weber , Michael Ingrisch , Bernd Bischl , David Rügamer

分类：机器学习

2021-10-21

深度学习在生存分析中的应用（SA）允许在传统的生存方法中利用非结构化和高维数据类型罕见。这允许推进数字健康，预测性维护和流失分析等领域的方法，但由于基于深度学习的方法的黑匣子特征，通常会产生更少的可解释和直观的模型。我们通过提出1）多任务变分性AutoEncoder（VAE），以存活目标，产生生存的嵌入，2）一种新的方法危险障碍，允许在原始数据空间中模拟危险因素的新方法危险。HazardWalk将ioirencoder的潜在分布转换为最大化/最小化危险区域，然后使用解码器对原始域的项目更改。我们的程序在模拟数据集以及肝转放患者的CT成像数据的数据集上进行评估。

translated by 谷歌翻译

Survival-oriented embeddings for improving accessibility to complex data structures

Tobias Weber , Michael Ingrisch , Matthias Fabritius , Bernd Bischl , David Rügamer

分类：机器学习

2021-10-21

深度学习擅长在非结构化数据分析中，最近的进步允许将这些技术扩展到生存分析。在临床放射学的背景下，这使得例如将非结构化的体积图像与风险评分或预期预期的预后和支持临床决策相关。然而，医学应用与高临界有关，因此，医生和患者均不会接受黑匣子模型作为决策的原因或基础。除了向新技术的厌恶之外，这是由于许多机器学习方法的可解释性，透明度和问责制为。我们提出了一种危险的正规化变分性，可以在生存分析中，支持对深神经结构的直接解释，在生存分析中，一个在医疗保健中高度相关的领域。我们将建议的腹部CT扫描方法应用于肝脏肿瘤的腹部CT扫描及其相应的存活时间。

translated by 谷歌翻译

PennyLane: Automatic differentiation of hybrid quantum-classical computations

Ville Bergholm , Josh Izaac , Maria Schuld , Christian Gogolin , Shahnawaz Ahmed , Vishnu Ajith , M. Sohaib Alam , Guillermo Alonso-Linaje , B. AkashNarayanan , Ali Asadi

分类：机器学习

2018-11-12

Pennylane是用于量子计算机可区分编程的Python 3软件框架。该库为近期量子计算设备提供了统一的体系结构，支持量子和连续变化的范例。 Pennylane的核心特征是能够以与经典技术（例如反向传播）兼容的方式来计算变异量子电路的梯度。因此，Pennylane扩展了在优化和机器学习中常见的自动分化算法，以包括量子和混合计算。插件系统使该框架与任何基于门的量子模拟器或硬件兼容。我们为硬件提供商提供插件，包括Xanadu Cloud，Amazon Braket和IBM Quantum，允许Pennylane优化在公开访问的量子设备上运行。在古典方面，Pennylane与加速的机器学习库（例如Tensorflow，Pytorch，Jax和Autograd）接口。 Pennylane可用于优化变分的量子本素体，量子近似优化，量子机学习模型和许多其他应用。

translated by 谷歌翻译

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models

S. M. Ali Eslami , Nicolas Heess , Theophane Weber , Yuval Tassa , David Szepesvari , Koray Kavukcuoglu , Geoffrey E. Hinton

分类：

2016-03-28

We present a framework for efficient inference in structured image models that explicitly reason about objects. We achieve this by performing probabilistic inference using a recurrent neural network that attends to scene elements and processes them one at a time. Crucially, the model itself learns to choose the appropriate number of inference steps. We use this scheme to learn to perform inference in partially specified 2D models (variable-sized variational auto-encoders) and fully specified 3D models (probabilistic renderers). We show that such models learn to identify multiple objects -counting, locating and classifying the elements of a scenewithout any supervision, e.g., decomposing 3D images with various numbers of objects in a single forward pass of a neural network at unprecedented speed. We further show that the networks produce accurate inferences when compared to supervised counterparts, and that their structure leads to improved generalization.

translated by 谷歌翻译

Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning

Thanh Le-Cong , Duc-Minh Luong , Xuan Bach D. Le , David Lo , Nhat-Hoa Tran , Bui Quang-Huy , Quyet-Thang Huynh

分类：机器学习

2023-01-03

In this paper, we propose a novel technique, namely INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR reasons about program semantic via program invariants while it also captures program syntax via language semantic learned from large code corpus using the pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that a APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains errors behaviors of the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is three-fold. First, INVALIDATOR is able to leverage both semantic and syntactic reasoning to enhance its discriminant capability. Second, INVALIDATOR does not require new test cases to be generated but instead only relies on the current test suite and uses invariant inference to generalize the behaviors of a program. Third, INVALIDATOR is fully automated. We have conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline. INVALIDATOR also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.

translated by 谷歌翻译

Conservation Tools: The Next Generation of Engineering--Biology Collaborations

Andrew Schulz , Cassie Shriver , Suzanne Stathatos , Benjamin Seleb , Emily Weigel , Young-Hui Chang , M. Saad Bhamla , David Hu , Joseph R. Mendelson III , .

分类：机器学习

2023-01-03

The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.

translated by 谷歌翻译